Dependency

The Radeon 9700 was the 8500's successor. It improved on the 8500 somewhat. The vertex shader gained real conditional branching logic. Some of the limits were also relaxed; the number of available outputs and uniforms increased. The fragment shader's architecture remained effectively the same; the 9700 simply increased the limits. There were 8 textures available and 16 opcodes, and it could perform 4 passes over this set.

The GeForce FX, released in 2003, was a substantial improvement, both over the GeForce 3/4 and over the 9700 in terms of fragment processing. NVIDIA took a different approach to their fragment shaders; their fragment processor worked not entirely unlike modern shader processors do.

It read an instruction, which could be a math operation, conditional branch (they had actual branches in fragment shading), or texture lookup instruction. It then executed that instruction. The texture lookup could be from a set of 8 textures. And then it repeated this process on the next instruction. It was doing math computations in a way not entirely unlike a traditional CPU.

There was no real concept of a dependent texture access for the GeForce FX. The inputs to the fragment pipeline were simply the texture coordinates and colors from the vertex stage. If you used a texture coordinate to access a texture, it was fine with that. If you did some computations with them and then accessed a texture, it was just as fine with that. It was completely generic.

It also failed in the marketplace. This was due primarily to its lateness and its poor performance in high-precision computation operations. The FX was optimized for doing 16-bit math computations in its fragment shader; while it could do 32-bit math, it was half as fast when doing this. But Direct3D 9's shaders did not allow the user to specify the precision of computations; the specification required at least 24-bits of precision. To match this, NVIDIA had no choice but to force 32-bit math on all D3D 9 applications, making them run much slower than their ATI counterparts (the 9700 always used 24-bit precision math).

Things were no better in OpenGL land. The two competing unified fragment processing APIs, GLSL and an assembly-like fragment shader, did not have precision specifications either. Only NVIDIA's proprietary extension for fragment shaders provided that, and developers were less likely to use it. Especially with the head start that the 9700 gained in the market by the FX being released late.

It performs so poorly in the market that NVIDIA dropped the FX name for the next hardware revision. The GeForce 6 improved its 32-bit performance to the point where it was competitive with the ATI equivalents.

This level of hardware saw the gaining of a number of different features. sRGB textures and framebuffers appeared, as did floating-point textures. Blending support for floating-point framebuffers was somewhat spotty; some hardware could do it only for 16-bit floating-point, some could not do it at all. The restrictions of power-of-two texture sizes was also lifted, to varying degrees. None of ATI's hardware of this era fully supported this when used with mipmapping, but NVIDIA's hardware from the GeForce 6 and above did.

The ability to access textures from vertex shaders was also introduced in this series of hardware. Vertex texture accesses uses a separate list of textures from those bound for fragment shaders. Only four textures could be accessed from a vertex shader, while 8 textures was normal for fragment shaders.

Render to texture also became generally available at this time, though this was more of an API issue (neither OpenGL nor Direct3D allowed textures to be used as render targets before this point) than hardware functionality. That is not to say that hardware had no role to play. Textures are often not stored as linear arrays of memory the way they are loaded with glTexImage. They are usually stored in a swizzled format, where 2D or 3D blocks of texture data are stored sequentially. Thus, rendering to a texture required either the ability to render directly to swizzled formats or the ability to read textures that are stored in unswizzled formats.

More than just render to texture was introduced. What was also introduced was the ability to render to multiple textures or buffers at one time. The number of renderable buffers was generally limited to 4 across all hardware platforms.